Shelving buffer

A shelving buffer is a technique used in computer processors to increase the efficiency of superscalar processors.

Background

A Superscalar processor allows the execution of a number of instructions simultaneously in the core of the processor itself, although this behavior is not to be confused with a multi-processor system. Most modern processors are superscalar.

Problems with Data Dependencies

Executing instructions in parallel (i.e. simultaneously) raises problems with data dependencies, meaning that some instructions may be dependent on the results of others, and hence care must be taken to execute in the correct order.

Take for example these sequence of instructions:

r1 = r2 + r3
r7 = r1 + r4

We have a RAW (Read After Write) data dependency here, meaning that we must wait for instruction 1 to finish before executing instruction 2, as we require the correct value of r1 (register 1). Hence these instruction cannot be executed simultaneously.

How it works?

With a superscalar processor, the instruction window of the processor fills up with a number of instructions (known as the issue rate). Depending on the scheme that the superscalar processor uses to dispatch these instruction from the window to the execution core of the CPU, we may encounter problems if there is a dependency not unlike the one shown above.

Consider an instruction window 3 instructions wide, containing i1, i2, i3 (instructions 1,2 & 3). Suppose that i2 is dependent on an instruction that has not yet finished executing, and we cannot execute it yet.

Without the use of a shelving buffer, the superscalar processor will execute i1, wait until i2 can be executed and then execute i2 and i3 simultaneously.

However with the use of a shelving buffer, the instruction window will be emptied into shelving buffers regardless of contents. The processor will then search for an appropriate number of instructions in the shelving buffers that can be executed in parallel (i.e. with no dependencies).

Hence the processor has a greater chance of running the maximum number of instructions simultaneously, and maximising throughput.